33 research outputs found

    Fast Locality-Sensitive Hashing Frameworks for Approximate Near Neighbor Search

    Full text link
    The Indyk-Motwani Locality-Sensitive Hashing (LSH) framework (STOC 1998) is a general technique for constructing a data structure to answer approximate near neighbor queries by using a distribution H\mathcal{H} over locality-sensitive hash functions that partition space. For a collection of nn points, after preprocessing, the query time is dominated by O(nρlogn)O(n^{\rho} \log n) evaluations of hash functions from H\mathcal{H} and O(nρ)O(n^{\rho}) hash table lookups and distance computations where ρ(0,1)\rho \in (0,1) is determined by the locality-sensitivity properties of H\mathcal{H}. It follows from a recent result by Dahlgaard et al. (FOCS 2017) that the number of locality-sensitive hash functions can be reduced to O(log2n)O(\log^2 n), leaving the query time to be dominated by O(nρ)O(n^{\rho}) distance computations and O(nρlogn)O(n^{\rho} \log n) additional word-RAM operations. We state this result as a general framework and provide a simpler analysis showing that the number of lookups and distance computations closely match the Indyk-Motwani framework, making it a viable replacement in practice. Using ideas from another locality-sensitive hashing framework by Andoni and Indyk (SODA 2006) we are able to reduce the number of additional word-RAM operations to O(nρ)O(n^\rho).Comment: 15 pages, 3 figure

    Algorithms for Stable Matching and Clustering in a Grid

    Full text link
    We study a discrete version of a geometric stable marriage problem originally proposed in a continuous setting by Hoffman, Holroyd, and Peres, in which points in the plane are stably matched to cluster centers, as prioritized by their distances, so that each cluster center is apportioned a set of points of equal area. We show that, for a discretization of the problem to an n×nn\times n grid of pixels with kk centers, the problem can be solved in time O(n2log5n)O(n^2 \log^5 n), and we experiment with two slower but more practical algorithms and a hybrid method that switches from one of these algorithms to the other to gain greater efficiency than either algorithm alone. We also show how to combine geometric stable matchings with a kk-means clustering algorithm, so as to provide a geometric political-districting algorithm that views distance in economic terms, and we experiment with weighted versions of stable kk-means in order to improve the connectivity of the resulting clusters.Comment: 23 pages, 12 figures. To appear (without the appendices) at the 18th International Workshop on Combinatorial Image Analysis, June 19-21, 2017, Plovdiv, Bulgari

    Computing the Fréchet Distance with a Retractable Leash

    Get PDF
    All known algorithms for the Fréchet distance between curves proceed in two steps: first, they construct an efficient oracle for the decision version; second, they use this oracle to find the optimum from a finite set of critical values. We present a novel approach that avoids the detour through the decision version. This gives the first quadratic time algorithm for the Fréchet distance between polygonal curves in (Formula presented.) under polyhedral distance functions (e.g., (Formula presented.) and (Formula presented.)). We also get a (Formula presented.)-approximation of the Fréchet distance under the Euclidean metric, in quadratic time for any fixed (Formula presented.). For the exact Euclidean case, our framework currently yields an algorithm with running time (Formula presented.). However, we conjecture that it may eventually lead to a faster exact algorithm

    SeqAn An efficient, generic C++ library for sequence analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.</p> <p>Results</p> <p>To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.</p> <p>Conclusion</p> <p>We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.</p

    DYNAMIC PARTITION TREES

    No full text
    In this paper we study dynamic variants of conjugation trees and related structures that have recently been introduced for performing various types of queries on sets of points and line segments, like half-planar range searching, shooting, intersection queries, etc. For most of these types of queries dynamic structures are obtained with an amortized update time of O(log2 n) (or less) with only minor increases in query times. As an application of the method we obtain an output-sensitive method for hidden surface removal in a set of n triangles that runs in time 0(nlog n + n.kg-gamma) where gamma = log2 ((1 + square-root 5)/2) almost-equal-to 0.695 and k is the size of the visibility map obtained

    Hidden surface removal for axis-parallel polyhedra

    No full text
    An efficient, output-sensitive method for computing the visibility map of a set of axis-parallel polyhedra (i.e. polyhedra with their faces and edges parallel to the coordinate axes) as seen from a given viewpoint is introduced. For nonintersecting polyhedra with n edges in total, the algorithm runs in time O((n+k)log n), where k is the complexity of the visibility map. The method can handle cyclic overlap of the polyhedra and perspective views without any problem. For c-oriented polyhedra (with faces and edges in c orientations, for some constant c) the method can be extended to run in the same time bound. The method can be extended even further to deal with intersecting polyhedra with only a slight increase in the time bound
    corecore